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Abstract — In  addition  to  motor  dysfunction, 
Parkinson’s  disease  (PD)  often  results  in  symptoms  of 
cognitive  impairment  and  depression,  which  can  go 
underdiagnosed  and  undertreated.  One  approach  that 
may  improve  diagnosis  and  differentiation  of  motor, 
cognitive,  and  depressive  symptoms  of  PD  relies  on  vocal 
acoustics  that  has  previously  been  used  to  predict 
symptoms  in  each  of  these  domains  separately.  In  this 
paper,  a  joint  multi-domain  characterization  of  the  PD 
symptoms  is  presented.  Speech  recordings  from  35  PD 
patients  were  analyzed  for  speech  markers  characterizing 
articulatory  coordination  based  on  resonant  (formant) 
frequencies  and  delta-mel  cepstral  coefficients  (dMFCC), 
as  well  as  phonemic  timing  based  on  phoneme-dependent 
speaking  rates.  Moderate  correlations  were  found 
between  vocal  markers  and  the  motor  and  cognitive 
symptoms  of  PD,  and  weaker  correlations  with  depressive 
symptoms.  We  identified  notable  differences  in  the 
correlation  patterns,  suggesting  it  may  be  possible  to 
distinguish  the  impact  of  different  PD  symptoms  on 
speech.  Statistical  models  based  on  the  vocal  markers 
achieved  moderate  accuracy  in  predicting  motor  severity 
(r=0.42)  and  global  cognition  (r=0.52)  but  not  depression 
(r=-0.21).  Future  study  is  warranted  to  further  develop 
symptom-specific  vocal  marker  models  in  PD. 

1.  Introduction 

Parkinson’s  disease  (PD)  is  the  second  most  common 
neurodegenerative  disease  and  affects  one  million 
Americans.  Although  PD  is  most  often  characterized  by  its 
motor  symptoms,  non-motor  symptoms  are  prevalent  and 
highly  impactful.  Common  non-motor  symptoms  include 
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depression,  which  occurs  in  35%  of  patients  with  PD  [1]  and 
cognitive  impairment,  which  occurs  in  80%  of  patients 
withPD  over  their  disease  course  [2].  Depression  and 
cognitive  impairment  are  associated  with  more  severe 
disease  course,  impaired  quality  of  life,  and  increased 
mortality  [3-5]  and  may  interact  with  the  motor  features  of 
PD.  Depression  causes  psychomotor  slowing  and  flat  affect 
that  mimics  the  characteristic  bradykinesia  and  facial 
masking  that  occurs  in  PD.  Patients  with  PD  often  have  a 
monotonous  voice  with  diminished  prosody,  which  shares 
features  with  voice  changes  in  depression  [6,  7].  Likewise, 
cognitive  changes  in  PD  often  involve  executive  dysfunction, 
which  can  impact  verbal  fluency  [9],  as  well  as  gait  and 
balance  [10],  As  a  result,  current  clinical  assessment  tools 
are  unable  to  clearly  distinguish  the  impact  of  these  non¬ 
motor  symptoms  and  motor  symptoms  on  global  function 
and  thus  are  underdiagnosed  and  undertreated  in  PD  [8]. 
Clinicians  need  objective  and  precise  tools  to  assess 
neuropsychiatric  non-motor  symptoms  in  PD.  Vocal  markers 
could  be  a  powerful  tool  to  meet  these  challenges  and  help  to 
disentangle  the  underlying  neurophysiology  of  mood, 
cognitive,  and  motor  impairment  in  PD. 

Although  there  has  been  growing  interest  in  using 
automated  voice  analysis  to  detect  and  monitor  PD  disease 
status  [11-18],  this  body  of  work  has  focused  largely  on 
motor  impairments  that  are  the  most  commonly  recognized 
changes  in  speech  in  PD.  These  include  imprecise 
articulation,  monotonous  and  reduced  pitch  and  volume, 
variable  speech  rate  and  pause  segments,  breathy  and  harsh 
voice  quality,  and  changes  in  intonation  and  rhythm  [11]. 
Almost  all  patients  with  PD  experience  motor  vocal  changes 
over  their  disease  course.  Vocal  markers  based  on  these 
changes  can  distinguish  PD  from  healthy  controls  [12-14] 
and  predict  PD  disease  severity  [14-18],  However,  this  work, 
in  focusing  on  motor  symptoms,  has  not  accounted  for 
affective  and  cognitive  influences  on  speech.  To  develop 
useful  vocal  markers  in  PD,  the  motor,  cognitive  and 
affective  components  of  speech  need  to  be  better 
distinguished  and  understood. 

Motor  and  neuropsychiatric  deficits  in  PD  are  thought  to 
reflect  different  underlying  neurochemical  and 


neuropathological  etiologies.  Motor  symptoms  are  due  to 
degeneration  of  dopaminergic  neurons  in  the  substantia  nigra, 
with  resulting  dopaminergic  deficit  in  the  basal  ganglia. 
Depression  in  PD  is  a  more  heterogeneous  process,  involving 
dopamine  [19],  serotonin  [20],  and  norepineprhine  [21] 
depletion,  and  structural  changes  in  limbic  regions  [22,  23], 
Cognitive  impairment  is  related  in  part  to  a  dopamine  deficit, 
which  causes  dysfunction  of  the  cognitive  pathways 
connecting  the  frontal  lobes  and  basal  ganglia.  This  impacts 
executive  function,  notably  including  lexical  retrieval  and 
processing  speed  [24,  25].  However,  many  other  factors  play 
a  role  in  cognition  in  PD  including  other  neurotransmitter 
levels,  and  both  PD-related  Lewy  pathology  and  amyloid 
protein  deposition  in  the  cortex  [26] .  Additionally,  functional 
neuroimaging  has  revealed  distinct  patterns  of  resting  state 
metabolism  associated  with  cognitive  and  depressive 
symptoms  in  PD  [27-29].  We  therefore  sought  to  identify 
vocal  markers  of  depression  and  cognition  in  PD  reflective 
of  these  differences  in  pathophysiology. 

We  have  previously  identified  vocal  markers  of 
depression  in  the  general  population  [30-32],  utilizing 
changes  in  phonation,  articulation  and  prosody  known  to 
occur  across  different  time  scales  and  speech  segments  to 
reflect  the  underlying  neurophysiology  of  speech  production. 
In  this  previous  work,  we  used  the  correlation  structure  of 
formant  frequencies  and  delta-mel  frequency  cepstral 
coefficients  (dMFCCs)  to  represent  underlying  changes  in 
vocal  tract  shape  and  dynamics,  as  well  as  phoneme- 
dependent  speaking  rates,  to  predict  symptom  severity  in 
major  depressive  disorder.  Using  a  similar  approach,  we  have 
also  previously  identified  vocal  markers  associated  with 
cognitive  impairment,  verbal  fluency,  and  cognitive  load  in 
the  general  population  [33-35].  In  PD,  linguistic  changes 
associated  with  cognitive  impairment  include  decreased 
speech  rate,  increased  pausing  between  utterances,  and 
impaired  grammar,  but  vocal  acoustics  related  to  these 
symptoms  have  not  been  explored  with  a  data-driven 
approach  [36,  37], 

In  the  current  work,  we  apply  our  timing-  and 
coordination-based  vocal  acoustic  feature  sets  and  data- 
driven  analytical  approach  in  patients  with  PD  to  predict 
motor,  cognitive,  and  depressive  symptoms  in  PD.  We 
hypothesize  that  these  PD  symptoms  will  correlate  with 
specific,  identifiable  changes  in  vocal  tract  shape  and 
dynamics  and  are  dependent  on  articulatory  and  phonetic 
categories.  Additionally,  we  aim  to  explore  the  differential 
impact  of  motor  impairment,  cognitive  impairment,  and 
depression  on  PD  speech.  We  hypothesize  that  while  some 
features  would  overlap  across  these  domains,  others  will  be 
distinctly  correlated  with  each  domain.  Our  long-term  goal  is 
to  establish  symptom-specific  feature  clusters  that  will  fuel 
the  growing  field  of  vocal  biomarkers  in  PD. 


2.  Methods 

2.1.  Audio  Recordings 

We  studied  35  PD  patients,  who  were  enrolled  at  the 
Perelman  School  of  Medicine  at  the  University  of 
Pennsylvania.  PD  was  diagnosed  according  to  published 
criteria  [38].  All  subjects  completed  an  informed  consent 
procedure  in  accordance  with  the  Declaration  of  Helsinki  and 
approved  by  the  Institutional  Review  Board  of  the  University 
of  Pennsylvania. 

Motor  deficits  were  characterized  by  the  Unified 
Parkinson’s  disease  Rating  Scale  (UPDRS)  obtained  within  6 
months  of  speech  testing.  Global  cognition  was  assessed 
with  the  Montreal  Cognitive  Assessment  (MoCA).  Patients 
with  dementia  were  excluded  based  on  MoCA  score  <24  [39]. 
Depressive  symptoms  were  assessed  with  the  Geriatric 
Depression  scale  (GDS)  [40].  Table  I  summarizes  basic 
assessment  statistics  from  the  35  patients,  and  Table  II 
indicates  between-outcome  Spearman  correlations. 


TABLE  I.  Statistics  on  assessments  used  to  measure  PD 
symptoms  in  motor  (UPDRS),  cognitive  (MoCA),  and 
depression  (GDS)  domains. 


Assessment 

Min 

Max 

Mean 

Std.  Dev. 

UPDRS 

3 

53 

20.46 

10.45 

MoCA 

24 

30 

27.31 

2.03 

GDS 

0 

13 

2.46 

2.94 

TABLE  II.  Spearman  correlations  between  assessment 
outcome  measures. 


Assessments 

r 

P 

UPDRS  &  MoCA 

-0.38 

0.02 

UPDRS  &  GDS 

0.27 

0.12 

MoCA  &  GDS 

0.01 

0.94 

2.2.  Feature  Extraction 

We  utilize  three  feature  sets,  described  below,  which 
have  been  shown  in  our  previous  research  to  be  predictive  of 
PD  motor  severity  [17],  major  depressive  disorder  [30-32], 
and  cognitive  performance  [33,  34].  These  feature  sets 
characterize  articulatory  coordination  based  on  the  dynamics 
of  vocal  resonant  frequencies  and  spectral  properties,  as  well 
as  changes  in  vocal  timing  at  the  phoneme  level. 

Articulatory  Coordination 

Formant  frequency  tracks.  Properties  of  vocal  tract 
resonances  over  time  contain  information  about  speech 
dynamics  related  to  articulatory  properties  of  the  depressed 
voice.  A  formant  tracking  algorithm  based  on  Kalman 


filtering  was  used  to  obtain  smooth  estimates  of  the  first 
three  resonant  frequencies  over  time  [42].  Formant 
frequencies  were  extracted  every  10  ms  from  the  audio 
signal.  Embedded  in  the  formant  tracking  algorithm  is  a 
voice-activity  detector  that  allows  a  Kalman  smoother  to 
smoothly  coast  through  non-speech  regions.  Estimates  of  the 
third  formant  that  went  above  a  threshold  of  4.5k  Hz  were 
truncated. 

Articulatory  coordination  was  estimated  from  formant  tracks 
using  correlation  structure  features.  In  this  feature  approach, 
a  channel-delay  correlation  matrix  is  computed  from  the 
formant  tracks  using  time-delay  embedding.  The  correlation 
matrix  has  dimensionality  (45  x  45),  based  on  three  formant 
channels  and  15  time  delays  per  channel.  A  single  delay 
scale  with  7-frames  (70  ms)  delay  spacing  is  used.  From  the 
correlation  matrix  a  45-dimensional  rank  ordered 
eigenspectrum  is  computed,  which  characterizes  the  within- 
channel  and  cross-channel  distributional  properties  of  the 
multivariate  formant  time  series. 

Delta  Mel  Frequency  Cepstral  Coefficients.  To  introduce 
vocal  tract  spectral  magnitude  information,  a  standard  set  of 
16  Mel  Frequency  Cepstral  Coefficients  (MFCCs)  was 
generated  by  Opensmile  from  segmented  but  otherwise 
unprocessed  audio  files  [43],  Delta  MFCCs  (dMFCCs)  were 
then  computed,  which  reflect  dynamic  velocities  of  the 
MFCCs  over  time.  Delta  coefficients  were  computed  using 
regression  over  a  5 -frame  window.  A  channel-delay 
correlation  matrix  was  computed  from  the  dMFCCs  using 
time-delay  embedding,  with  dimensionality  (240  x  240), 
based  on  16  dMFCC  channels  and  15  delays  per  channel 
with  1 -frame  (10  ms)  delay  spacing.  From  this  matrix  the 
240-dimensional  rank-ordered  eigenspectrum  was  computed, 
which  characterizes  the  within-channel  and  cross-channel 
distributional  properties  of  the  multivariate  dMFCC  time 
series. 

Phonetic  Timing:  We  have  found  that  computing  phoneme 
specific  characteristics,  rather  than  average  measures  of 
speaking  rate,  reveals  stronger  relationships  between  speech 
rate  and  depression  severity  [31,  45].  Using  an  automatic 
phoneme  recognition  algorithm  [44],  we  detect  phonetic 
boundaries  and  phoneme  specific  durations  that  are 
associated  with  each  instance  of  the  40  classes  of  defined 
phonetic  speech  units.  Consistent  with  previous  work  on  free 
speech  recordings  that  have  variable  total  durations,  the 
summed  durations  of  each  phoneme  are  normalized  by  the 
total  number  of  phonemes  to  produce  a  phoneme  rate 
measure. 

Based  on  previous  work  [31,  45],  for  the  phonemes  that  have 
rates  that  are  highly  correlated  with  an  outcome  measure 
(UPDRS,  MoCA,  or  GDS)  on  the  training  set,  the  rates  are 
linearly  combined  to  yield  a  fused  phoneme  rate  measure, 
with  the  sign  of  the  combination  weight  based  on  the 
correlation  sign.  Consistent  with  [45],  we  use  binary  weights 


of  1  or  -1  to  combine  the  rates  of  the  selected  phonemes.  In 
the  current  work,  we  select  the  top  eight  correlating 
phonemes,  since  eight  is  the  average  of  the  number  of 
phonemes  (six  and  ten)  use  on  two  different  recorded 
passages  in  [31]. 

2.3.  Correlational  Analysis 

The  vocal  markers  consist  of  high  dimensional  feature 
vectors:  45 -dimensional  formant  eigenspectra,  240- 
dimensional  dMFCC  eigenspectra,  and  40-dimensional 
phoneme  rate  measures.  As  a  first  level  of  analysis,  the 
correlations  of  the  raw  feature  elements  with  the  three  PD 
symptom  outcomes  are  measured.  Specifically,  correlations 
are  measured  with  motor  symptoms  (UPDRS  scores), 
cognitive  impairment  symptoms  (MoCA  scores),  and 
depression  symptoms  (GDS  scores).  Because  MoCA  scores 
are  negative  indicators  of  impairment,  correlatons  are  done 
with  negative  MoCA  scores,  for  sign  consistency  with  the 
UPDRS  and  GDS  correlations. 

2.4.  Outcome  Prediction 

Regression  Model.  A  standard  statistical  approach  called 
Gaussian  staircase  (GS)  regression  [17,  30,  31]  is  used  for 
prediction.  GS  generalizes  the  use  of  a  Gaussian  classifier 
for  regression  into  an  ensemble  of  Gaussians  for  each  class 
(Class  1  =  “lower”,  Class  2  =  “higher”),  based  on 
partitioning  of  the  range  of  values  of  the  outcome  variable. 
The  ensemble  of  Gaussians  associated  with  each  class  is 
interpreted  as  a  Gaussian  mixture  model,  such  that  the  Class 
1  or  Class  2  likelihood  for  a  test  data  point  is  the  sum  of 
likelihoods  from  the  Gaussian  ensemble  associated  with  that 
Class.  The  outcome  prediction  is  then  obtained  using 
univariate  regression  based  on  the  two-class  log-likelihood 
scores,  with  the  regression  model  constructed  from  the 
training  set  log-likelihood  scores. 

For  this  work,  GS  levels  were  determined  for  each  outcome 
variable  as  follows.  The  UPDRS  Class  1  partitions  were  [0- 
11,  0-16,  0-21,  0-26],  the  MoCA  Class  1  partitions  were  [0- 
25,  0-26,  0-27,  0-28,  0-29],  and  the  GDS  Class  1  partitions 
were  [0-0,  0-1,  0-2,  0-3,  0-4].  For  each  symptom  domain,  the 
Class  2  partitions  are  simply  the  complement  of  the  Class  1 
partitions.  Regularization  was  done  by  adding  covariance 
values  of  10  to  the  diagonal  elements  of  the  Gaussian 
covariance  matrices.  2nd-order  univariate  regression  was 
used  to  map  log-likelihoods  to  outcome  predictions. 

Dimensionality  Reduction.  In  order  to  avoid  possible 
feature  selection  biases  on  a  relatively  small  (35 -subject) 
dataset,  we  adopted  the  same  feature  selection  parameters 
that  were  used  in  the  depression  prediction  system  that  won 
first  place  in  the  AVEC  2014  depression  prediction  sub¬ 
challenge  [31].  Specifically,  the  first  four  principal 
component  features  were  used  for  the  formant  correlation 


structure  features,  and  the  first  five  principal  components  for 
the  delta-MFCC  correlation  structure  features. 

In  the  AVEC  2014  system,  phoneme  rate  features  were  used 
on  two  different  passages,  a  read  passage  and  a  free  speech 
passage.  The  top  six  correlating  phonemes  were  combined  in 
the  read  passage,  and  the  top  ten  phonemes  in  the  free 
speech  passage.  Here,  we  split  the  difference,  using  the  top 
eight  correlating  phonemes. 

Cross-validation.  In  order  to  obtain  an  unbiased  estimate  of 
generalization  performance,  a  cross-validation  procedure  is 
used  such  that  statistical  models  are  trained  on  rotating  data 
subsets,  and  applied  to  held-out  test  data,  with  no  overlap  in 
subject  identity  between  training  and  test  sets.  Within  this 
procedure,  all  transformations  that  depend  on  features 
obtained  across  multiple  recordings,  such  as  z-scoring,  PCA, 
and  correlation-based  phoneme  rate  aggregation,  are 
computed  strictly  within  the  training  set  and  then  applied  to 
the  held-out  test  set. 

One  shortcoming  of  small  data  sets  is  that  randomly 
partitioned  cross-validation  folds,  or  leave-one-subject  cross- 
validation,  can  result  in  negatively  biased  estimates,  since  the 
training  set  outcome  variables  tend  to  be  negatively 
correlated  with  the  test  set  outcome  variable.  To  avoid  this 
complication,  a  12-fold  stratified  sampling  cross-validation 
procedure  is  used,  in  which  the  expected  value  of  the 
outcome  variable  is  kept  as  consistent  as  possible  across  the 
different  test  folds. 

3.  Results 

3.1.  Correlational  Analysis 

Figure  1  shows  the  Spearman  correlations  between  the 
formant  eigenvalue  features  (top)  and  the  dMFCC 
eigenvalue  features  (bottom)  with  the  three  outcomes.  The 
eigenvalues  are  rank  ordered  left  to  right  from  largest  to 
smallest.  Overall,  smaller  formant  eigenvalues  are 
negatively  correlated  with  symptom  severity,  and  smaller 
dMFCC  eigenvalues  are  positively  correlated  with  symptom 
severity.  This  correlation  pattern  is  stronger  for  motor  and 
cognitive  symptoms,  and  weaker  for  depressive  symptoms. 

This  pattern  of  results  is  consistent  with  previous  work 
showing  similar  patterns  in  formant-  and  dMFCC-based 
eigenvalue  features  for  motor  symptoms  in  Parkinson’s 
disease  [17]  and  for  symptoms  of  major  depressive  disorder 
[30,  31].  Similar  results  have  also  been  found  in  formant- 
based  eigenavalue  features  for  symptoms  of  reduced 
cognitive  performance  related  to  aging  [34]  and  possible 
mTBI  [33]. 

Despite  the  overall  similarities  in  correlation  patterns 
among  the  various  PD  symptom  domains,  there  are  some 
notable  differences.  First,  the  depression  symptoms  show 
smaller  absolute  correlation  levels.  This  may  be  due  to  the 


low  and  restricted  range  of  GDS  scores  (see  Table  1), 
indicating  minimal  depressive  symptoms  are  present  in  this 
cohort.  Second,  there  are  notable  shifts  in  the  UPDRS  versus 
MoCA  correlation  patterns.  For  the  formant-based 
eigenvalues,  there  are  stronger  negative  correlations  with 
UPDRS  among  eigenvalues  5  through  20.  For  the  dMFCC- 
based  eigenvalues,  on  the  other  hand,  there  are  stronger 
correlations  with  MoCA  among  eigenvalues  20  through  40, 
and  stronger  correlations  with  UPDRS  among  eigenvalues 
54-140  and  220-240.  These  differences  indicate  that  there 
may  be  subtle  differences  in  the  effects  of  motor  and 
cognitive  symptoms  on  articulatory  speech  dynamics. 

Figure  2  shows  the  Spearman  correlations  between 
MoCA  outcomes.  For  clarity,  correlations  with  GDS  are  not 
included,  as  these  correlations  were  considerably  smaller. 
The  phonemes  are  divided  into  three  categories:  1)  phonemes 
with  similar  outcome  correlations  (left);  2)  strong  UPDRS 
and  weak  MoCA  correlations  (center);  3)  weak  UPDRS  and 
strong  MoCA  correlations  (right).  In  the  first  category  are 
phonemes  indicating  that  motor  and  cognitive  symptoms  are 
positively  associated  with:  slower  speech  planning  and 
execution  (‘sil’,  ‘ah’),  slower  labial,  labial/dental,  and/or 
tongue/dental  movements  (‘uw’,  ‘th’,  ‘b’,  ‘v’),  and  faster 


Formant  Corr.  Structure 


Eigenvalue  Index 

Figure  1.  Spearman  correlations  of  formant-based  (top)  and 
dMFCC-based  (bottom)  eigenvalue  features  with  three  PD 
symptom  outcomes.  Eigenvalues  are  ordered,  largest  to 
smallest,  from  left  to  right,  and  MoCA  is  sign-adjusted. 


Phoneme  Rates 


0.6 


-0.6 


sil  ah  uw  th  b  v  oy  ae  ey  ng  jh  z 

Figure  2.  Spearman  correlations  of  phoneme  rates  with 
motor  and  cognitive  impairment  symptoms  shown  for  those 
phonemes  that  have  lrl>0.28  for  either  outcome. 

execution  of  the  dipthong  vowel  ‘oy’.  In  the  second  category 
are  two  open-vowel  phonemes  (‘ae’,  ‘ey’)  that  require  large 
jaw  movements,  for  which  faster  rates  are  positively 
associated  with  motor  symptoms  only.  For  the  phonemes 
‘oy’,  ‘ae’,  and  ‘ey’,  the  positive  association  between  faster 
rates  and  symptom  severity  could  be  due  to  shorter  or  less 
complete  motor  trajectories.  Finally,  in  the  third  category  are 
three  consonants  (‘ng’,  ‘jh’,  ‘z’),  which  require  precise 
tongue  articulation,  for  which  slower  rates  are  positively 
associations  with  cognitive  symptoms  only. 

3.2.  Outcome  Prediction 

Table  III  shows  the  correlations  obtained  in  predicting  the 
three  outcome  variables  based  on  each  individual  feature  set 
using  12-fold  stratified  sampling.  It  also  shows  the  results 
obtained  by  combining  the  three  feature  sets,  which  was 
done  by  adding  the  GS  log-likelihood  ratios  across  the  three 
feature  sets,  prior  to  the  univariate  regression  step.  The 
dMFCC  feature  set  obtained  the  strongest  correlations  for 
both  motor  and  cognitive  symptoms.  For  cognitive 
symptoms,  predictions  combining  the  three  feature  sets 
improved  prediction  performance.  None  of  the  feature  sets 
were  useful  for  predicting  depression  symptoms.  Figure  3 
illustrates  the  predicted  motor  assessments  (top)  and 
cognitive  assessments  (bottom)  as  a  function  of  true 
assessment  values  for  the  combined  system. 


TABLE  III.  Spearman  correlations  between  predicted  and 
true  symptom  assessment  scores,  based  on  vocal  markers. 


Feature 

Sets 

UPDRS 

MoCA 

GDS 

r  (p) 

r  (P) 

r  (P) 

Formants 

0.20  (0.24) 

0.16  (0.35) 

-0.04  (0.84) 

dMFCC 

0.43  (0.01) 

0.39  (0.02) 

0.02  (0.92) 

Phonemes 

0.13  (0.46) 

0.29  (0.09) 

-0.45  (0.01) 

Combined 

0.42  (0.01) 

0.52  (0.00) 

-0.21  (0.23) 

Figure  3.  Fused  system  predictions  are  plotted  as  a  function 
of  true  values  for  UPDRS  (top;  r=0.42)  and  MoCA  (bottom; 
i=0.52). 

4.  Discussion 

Using  high-level  acoustic  features,  we  identified  vocal 
markers  of  neuropsychiatric  symptoms  in  PD.  Our  approach 
adds  to  the  field  of  vocal  biomarker  assessment  in  several 
ways.  First,  we  have  identified  vocal  markers  associated  with 
depression  and  cognition  in  PD  for  the  first  time  to  our 
knowledge.  Second,  we  established  that  different 
characteristic  values  of  formant  and  dMFCC  correlation 
structure,  and  of  phonemic  durations  and  categories,  are 
correlated  with  motor  and  non-motor  symptoms.  Our  results 
were  more  robust  for  cognition  compared  with  depression. 
Future  work  is  needed  to  explore  the  elements  of  the  speech 
task  demands  and  the  optimal  features  to  better  assess 
depression. 

Strengths  of  our  approach  include  the  relative 
independence  from  patient  characteristics  such  as  sex  and 
age,  due  to  the  normalization  inherent  in  the  correlation 
feature  structure  approach.  This  approach  may  also  more 
reliably  control  for  variations  within  individuals  with  PD. 
Variability  in  symptoms  is  a  challenge  for  PD  biomarker 
development,  since  PD  symptoms  may  vary  based  on 
environment,  concurrent  motor  and/or  cognitive  load,  and 
medication  effects.  Another  strength  is  the  ecological  validity 


of  our  approach.  We  were  able  to  demonstrate  success  using 
an  iPhone  in  typical  clinic  room  or  home  setting,  rather  than 
a  lab  setting.  This  suggests  that  vocal  biomarker  monitoring 
may  be  feasible  without  expensive  or  intrusive  equipment, 
allowing  it  to  characterize  the  patient’s  unique  daily 
experience.  Weaknesses  of  our  study  include  the  limited 
range  of  depressive  symptom  severity  in  this  patient 
population.  Future  work  should  include  PD  patients  with 
more  severe  depressive  symptoms.  In  addition,  our  sample 
size  was  relatively  small  and  included  only  patients  recruited 
at  an  academic  medical  center,  and  may  not  be  generalizable 
to  other  PD  patient  populations. 

5.  Conclusion 

Vocal  markers  are  a  promising  tool  to  assess  both  motor 
and  non-motor,  including  depressive  and  cognitive, 
symptoms  in  patients  with  PD.  This  work  supports  the 
feasibility  of  symptom-specific  feature  clusters  that  enhance 
the  further  development  of  vocal  biomarkers  in  PD  and 
suggests  correlation  dependence  on  articulatory  and  phonetic 
categories.  The  major  advantages  of  vocal  markers  are  that 
speech  can  be  tested  remotely  and  automatically,  allowing 
for  frequent  and  quantitative  symptom  assessment.  This 
approach  could  fuel  large-scale  screening  of  patients  and 
improved  monitoring  of  fluctuating  symptoms  during  daily 
activities,  as  well  as  monitoring  of  response  to  therapeutics. 
Further  research  is  warranted  using  multi-modality  feature 
analysis  with  additional  motor  and  affective  components  in 
order  to  better  detect  neuropsychiatric  symptoms  in  PD. 
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